Search CORE

9 research outputs found

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

Author: Arulkumaran Kai
Behbahani Feryal
Bharath Anil Anthony
Dai Tianhong
Gerbert Tamara
Tukra Samyakh
Publication venue: 'Elsevier BV'
Publication date: 07/07/2022
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

Author: Arulkumaran Kai
Behbahani Feryal
Bharath Anil Anthony
Dai Tianhong
Gerbert Tamara
Tukra Samyakh
Publication venue
Publication date: 17/02/2020
Field of study

Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap. However, less work has gone into understanding such agents - which are deployed in the real world - beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation. We train agents for Fetch and Jaco robots on a visuomotor control task and evaluate how well they generalise using different testing conditions. Finally, we investigate the internals of the trained agents by using a suite of interpretability techniques. Our results show that the primary outcome of domain randomisation is more robust, entangled representations, accompanied with larger weights with greater spatial structure; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Additionally, we demonstrate that our domain randomised agents require higher sample complexity, can overfit and more heavily rely on recurrent processing. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the combination of inspection tools in order to provide sufficient insights into the behaviour of trained agents

arXiv.org e-Print Archive

Learning from Demonstration in the Wild

Author: Behbahani Feryal
Chen Xi
Gomes Joao
IEEE
Kasewa Sudhanshu
Kurin Vitaly
Messias Joao
Oliehoek Frans A
Paul Supratik
Shiarlis Kyriacos
Stirbu Ciprian
Whiteson Shimon
Publication venue
Publication date: 01/01/2019
Field of study

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose Video to Behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.Comment: Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2019; extended version with appendi

arXiv.org e-Print Archive

University of Liverpool Repository

Oxford University Research Archive

Reverse-engineering the visual and haptic perceptual algorithms in the brain

Author: Behbahani Feryal Mehraban Pour
Publication venue: Computing, Imperial College London
Publication date: 01/03/2017
Field of study

Intelligent behaviour is fundamentally tied to the ability of the brain to make decisions in uncertain and dynamic environments. To accomplish this task successfully, the brain needs to categorise novel stimuli in real-time. In neuroscience, the generative framework of Bayesian Decision Theory has emerged as a principled way to predict how the brain has to act in the face of uncertainty. We sought to investigate whether the brain also uses generative Bayesian principles to implement its categorisation strategy. To this end, by adopting tools from machine learning as a quantitative framework, we reverse-engineered a novel experimental paradigm which allowed us to directly test this hypothesis in a variety of visual object categorisation tasks. In addition, our results provide new implications for existing models of human category learning and provide an ideal experimental paradigm for neurophysiological and functional imaging investigations of the underlying neural mechanisms involved in object recognition. We also turn to the problem of haptic object recognition by building on the belief that its underlying algorithms should resemble that of vision. Accordingly, we present a Bayesian ideal observer model for human haptic perception and object reconstruction which infers from contact point information on the surface of the hand and noisy hand proprioception, simultaneously the shape of the object together with an estimation of the true hand pose in space. We implement this theory using a recursive Bayesian estimation algorithm, inspired by the simultaneous localisation and mapping (SLAM) methods in robotics, which can operate on experimental data from human subjects as well as computer-based physical simulations. Our work enables the principled study of the haptic perception of complex objects and scenes in a similar principled manner that transformed research in the field of vision. Moreover, in conjunction with tactile-enabled prostheses, our model could allow for online object recognition and pose adaptation for more natural prosthetic control.Open Acces

Spiral - Imperial College Digital Repository

Learning from demonstration in the wild

Author: Behbahani Feryal (author)
Chen Xi (author)
Gomes Joao (author)
Kasewa Sudhanshu (author)
Kurin Vitaly (author)
Oliehoek F.A. (author)
Paul Supratik (author)
Shiarlis Kyriacos (author)
Stirbu Ciprian (author)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

TU Delft Repository